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and Masatsune Kainosho'!’?* 


The C-terminal domain (CTD) of the severe acute respiratory syndrome 
coronavirus (SARS-CoV) nucleocapsid protein (NP) contains a potential 
RNA-binding region in its N-terminal portion and also serves as a 
dimerization domain by forming a homodimer with a molecular mass of 
28 kDa. So far, the structure determination of the SARS-CoV NP CTD in 
solution has been impeded by the poor quality of NMR spectra, especially 
for aromatic resonances. We have recently developed the stereo-array 
isotope labeling (SAIL) method to overcome the size problem of NMR 
structure determination by utilizing a protein exclusively composed of 
stereo- and regio-specifically isotope-labeled amino acids. Here, we 
employed the SAIL method to determine the high-quality solution structure 
of the SARS-CoV NP CTD by NMR. The SAIL protein yielded less crowded 
and better resolved spectra than uniform ‘°C and '°N labeling, and enabled 
the homodimeric solution structure of this protein to be determined. The 
NMR structure is almost identical with the previously solved crystal 
structure, except for a disordered putative RNA-binding domain at the 
N-terminus. Studies of the chemical shift perturbations caused by the 
binding of single-stranded DNA and mutational analyses have identified 
the disordered region at the N-termini as the prime site for nucleic acid 
binding. In addition, residues in the B-sheet region also showed significant 
perturbations. Mapping of the locations of these residues onto the helical 
model observed in the crystal revealed that these two regions are parts of 
the interior lining of the positively charged helical groove, supporting the 
hypothesis that the helical oligomer may form in solution. 
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Introduction 


Severe acute respiratory syndrome (SARS) is a 
recently emergent disease caused by the SARS- 
associated coronavirus (CoV).'* The SARS-CoV 
nucleocapsid protein (NP) packages the viral geno- 
mic RNA into a ribonucleoprotein complex and is 
crucial for the assembly of infectious virus particles. 
Based on comparative NMR studies of SARS-CoV 
NP deletion constructs, the protein contains two 
structural domains: the N-terminal domain (NTD; 
containing residues 45-181) and the C-terminal 
domain (CTD; containing residues 248-365),° 
flanked by the long, disordered N- and C-termini 
and the linker sequence (Fig. la). The NTD 
reportedly acts as a putative RNA-binding domain, 
and the CTD functions as a dimerization domain.” 
Recently, the CTD has also been shown to bind 
nucleic acids with high affinity.” Structural investi- 
gations of the isolated SARS-CoV NP CTD have 
been performed by NMR and X-ray crystallography. 
In a previous NMR study, the topological structure 
of the isolated SARS-CoV NP CTD as a homodimer 
was elucidated based on limited intersubunit and 
intrasubunit nuclear Overhauser effects (NOEs).° 
This topology was confirmed by the subsequently 
reported crystal structures of the SARS-CoV NP 
CTD and of a shorter construct spanning residues 
270-370." The crystal structure of the CTD 
revealed that residues 248-280 form a positively 
charged patch, which acts as a putative oligonucleo- 
tide-binding region. The patch also participates in 
intermolecular and intramolecular interactions 
within the crystal, resulting in the formation of an 
octameric asymmetric unit. However, previous bio- 
chemical and biophysical studies have shown that 
the CTD exists solely as a dimer in solution.° These 
findings motivated us to investigate the nature of 
the CTD in solution. Initial attempts at the complete 
structure elucidation of the SARS-CoV NP CTD 
through NMR were impeded by its short T> relaxa- 
tion times and significant peak overlaps.° 

Recently, we have developed the stereo-array 
isotope labeling (SAIL) method, which utilizes pro- 
teins exclusively composed of stereo- and regio- 
specifically labeled amino acids.” Compared to 
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conventional uniform C,!°N isotopic labeling, the 
quality of the spectra of the SAIL sample was 
sufficiently improved, so that a high-resolution 
solution structure determination could be per- 
formed. The sharpened resonance lines and reduced 
peak overlap with the SAIL method are due to the 
selective deuteration of many nonlabile protons. The 
remaining protons with known stereo-specific 
assignments provide plentiful information about 
the structure of the protein. In this study, we per- 
formed an NMR study of the SARS-CoV NP CTD 
with the help of the SAIL method. Compared to the 
protein uniformly labeled with ‘°C and '°N, the 
SAIL sample significantly improved the quality of 
the NMR spectra, to the extent that a high-resolution 
structure determination could be performed. The 
tertiary structure obtained by NMR is almost iden- 
tical with that of the crystal structure, except for 
a disordered putative RNA-binding domain at the 
N-terminus. We further applied NMR, mutation 
analyses, and electrophoretic mobility shift assays 
(EMSAs) to pinpoint the nucleic-acid-binding site. 
The active site thus identified agrees well with the 
helical ribonucleoprotein model suggested by the 
crystal structure. 


Results 


Preparation of the SAIL sample of the SARS-CoV 
NP CTD 


The preparation of proteins composed of SAIL 
amino acids requires cell-free expression to effi- 
ciently incorporate the SAIL amino acids into the 
protein without them being affected by metabolic 
scrambling in living cells. The expression of the 
SARS-CoV NP CTD was initially examined in a 
small-scale reaction (Fig. 1b). Subsequently, the 
'H-N heteronuclear single-quantum coherence 
(HSQC) of the “°N-labeled SARS-CoV NP CTD 
produced by cell-free expression was compared 
with that produced by in vivo expression. These 
spectra were identical, thus confirming that the 
structures of the proteins produced by cell-free and 
in vivo expressions are identical (data not shown). 


Fig. 1. Preparation of the SARS- 
CoV NP CTD. (a) Schematic dia- 
gram of the domain architecture of 
the SARS-CoV NP. (b) SDS-PAGE of 
the cell-free reaction mixture for the 
SARS-CoV NP CTD. The left lane 
shows the molecular weight mar- 
+ kers. The band corresponding to the 
monomer of the SARS-CoV NP 
CTD is labeled with an arrow. 
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The final sample of the SAIL SARS-CoV NP CTD 
was produced by using the Escherichia coli cell-free 
protein synthesis system, with optimizations for the 
production of labeled NMR samples.’ 


The SAIL method improves the quality of the 
NMR spectrum 


The 'H-'°C constant time (CT) HSQC spectra of 
an aliphatic region were compared between uni- 
formly labeled (UL) and SAIL samples under the 
same conditions. In the case of the UL sample, the 
signals were prone to overlap between diastereo- 
topic pairs, and some signals were severely broa- 
dened beyond detection in the methylene region 
(Fig. 2a). In contrast, the corresponding spectrum 
from the SAIL sample had much better quality than 
that from the UL sample (Fig. 2b). The signal/noise 
ratios for the SAIL sample were several times higher 
than those for the corresponding UL sample, con- 
sistent with previous results for calmodulin- and 
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maltodextrin-binding proteins (Fig. 2c).° For the 
SARS-CoV NP CTD, the number of peaks to be 
observed theoretically in the 'H-'°C CT HSQC 
spectra, including methyl, methylene, and methane 
protons, decreased from 517 for the UL sample to 
343 for the SAIL sample, greatly simplifying the 
analytical process. 

For the aromatic region, improvement due to the 
use of the SAIL method was even more striking. The 
aromatic rings of UL Phe and Tyr contain four and 
five '°C-'H pairs, respectively (Fig. 3a). In the case 
of SAIL, the six-membered aromatic rings are 
labeled by alternating ‘*C-'H and '*C-7H moieties 
(FC at the « and Y positions; C at the 6 and ¢ 
positions) (Fig. 3b). ° In the UL sample, signals for 
the 'H-'°C moieties at the 6, ¢, and ¢ positions of 
Phe, and at the 6 position of Tyr, are overlapped 
around 131 ppm in the carbon dimension, thus 
resulting in severe spectral crowding (Fig. 3c). In 
contrast, the corresponding region for the SAIL 
sample was much simpler due to the presence of 
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Fig. 2. Comparisons of NMR spectra for the SARS-CoV NP CTD between UL and SAIL in the methylene region. 
Aliphatic region of 'H-'°C CT-HSQC for UL (a) and SAIL (b) SARS-CoV NP CTD. Both spectra were acquired under the 
same conditions. The sample concentration was 0.5 mM. In (b), assignments for the SAIL sample are labeled. (c) Cross- 
sections from (a) (red) and (b) (black). The peak scales are identical between the UL and SAIL spectra. 
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Fig. 3. Comparisons of NMR spectra for the SARS-CoV NP CTD between UL and SAIL in the aromatic region. 
Chemical structures of the aromatic rings for UL (a) and SAIL (c) phenylalanine. (b and qd) Phenylalanine signals of 'HE-BC 


HSQC for UL (b) and SAIL (d) SARS-CoV NP CTD. (e and f) Tyrosine signals of "H- 


C HSQC for UL (e) and SAIL (f) 


SARS-CoV NP CTD. To demonstrate the absence of the 'J.. coupling of aromatic rings for SAIL phenylalanine and 
tyrosine residues, all HSC HSOC spectra for the aromatic regions were recorded without the CT technique. 


signals exclusively from the 'H-'°C moieties at the 
é position of Phe. The short relaxation times led 
to the detection of resonances that were severely 
broadened beyond detection with the UL sample 
(Fig. 3d). Since one-bond '°C-'°C couplings in the 
UL protein did not exist in the SAIL protein, as 
shown in the Tyr He-Ce region (Fig. 3e and f), the 
CT technique with a long evolution time (~17 ms) 
is thus not required for the SAIL sample.'® 

With the benefits mentioned above, we were able 
to acquire a set of NMR spectra with high sensitivity, 
and the expected chemical shifts were assigned to 
91.2% completeness for the SAIL sample. 


Solution structure of SARS-CoV NP CTD 


Even though the SARS-CoV NP CTD existed as an 
octamer in the asymmetric unit in the crystal, abun- 
dant evidence suggests that it exists as a homodimer 
in solution.*° We manually assigned some of the 
intersubunit NOE peaks from the SARS-CoV NP 
CTD based on previous NMR work.® These manu- 
ally assigned intersubunit distance restraints were 


included in the combined automated assignments of 
the NOE peaks and the structure calculation by 
CYANA." NOE-derived distance restraints totalling 
2615 were obtained. Out of 100 structures calcu- 
lated, the 20 structures with the lowest target 
function values were selected and energy-refined 
as the final structures of the SARS-CoV NP CTD. 
The structural statistics for the NMR structure are 
summarized in Table 1. Consistent with the pre- 
viously reported topological structure of the SARS- 
CoV NP CTD, the NMR structure of the SARS-CoV 
NP CTD adopts a domain-swapped homodimer 
conformation (Fig. 4a and b). The resulting struc- 
tures exhibited good convergence for both the back- 
bone and the side chains when the regions between 
residues 260 and 365 of each dimer were super- 
imposed (Fig. 4a). The model has a disordered 
region spanning residues 248-259 and protruding 
from the dimer core, which appears to be the result 
of internal dynamics and prevents the detection of 
long-range NOEs based on heteronuclear NOE mea- 
surements. N-{'H} NOE experiments recorded 
for the SARS-CoV NP CTD showed that the hetero- 


612 


Structure of SAIL SARS-CoV N Protein 


Table 1. Structural statistics for the NMR structure of 
SARS-CoV CTD 


Parameter Quantity 
Completeness of chemical shift assignments (%) 91.2 
Total NOE upper distance bound restraints 2615 
Short range (!i—j| <1) 1313 
Medium range (1< |i—j! <5) 586 
Long range (1i—j! <5) 716 
Intermolecular 260 
Dihedral angle restraints (g and WJ) 236 
CYANA target function (A’) 2.55 
AMBER energy (kcal/mol) —6106 
Ramachandran plot statistics (%) 
Most favored regions 84.6 
Additionally allowed regions 14.8 
Generously allowed regions 0.6 
Disallowed regions 0.0 
Backbone RMSD for residues 260-365 (A) 0.77 
All heavy atom RMSD for residues 260-365 (A) 1.19 


nuclear NOE values for this region are smaller than 
those for the structured region, indicating that the 
two N-termini are flexible in solution (data not 
shown). The secondary structural elements of the 
CTD in solution were defined based on the DSSP 
algorithm,'? and they corresponded well to those 
identified in the previous NMR study (Fig. 4b and c).° 

Prior to our NMR investigation, the crystal struc- 
tures of constructs spanning residues 270-370 
[Protein Data Bank (PDB) ID 2gib] and residues 
248-365 (PDB ID 2cjr) were elucidated.”’” The 
overall folds and secondary structure arrangements 
are very similar between the crystal structures and 
the NMR structure (Fig. 4a and c). The backbone 
RMSD between the protomers of the mean NMR 
structure and the crystal structure of the CTD 
spanning residues 248-365 is 1.45 A if residues 
260-319 and 333-358 are superimposed. If the NMR 
structure is superimposed with the crystal structure 
spanning residues 270-370, then the backbone 
RMSD between the protomers is 1.26 A for the 
regions of residues 274-319 and 333-358. There are 
two differences, however: first, in both crystal struc- 
tures, the ®-sheet is distorted around residues 320- 
332 compared to those of the NMR structure (upper 
left of Fig. 5a and b). Second, the two N-termini 
(residues 248-259) protruding from the dimer core 
are disordered in the NMR structure, whereas, in the 
crystal structure, they are involved in a number of 
intramonomer and intradimer contacts and are 
more rigid (Fig. 4a). The disorder of the N-termini 
in solution was further supported by the analysis of 
backbone amide-exchange rates (data not shown). 
We suspect that at least some of these differences 
are likely to result from the different solvent con- 
ditions used for crystallization and/or crystal- 
packing effects. 


Nucleic-acid-binding sites of SARS-CoV NP CTD 


Deletion studies revealed that residues 248-280 
are essential for the nucleic: acid-binding activity 
of the SARS-CoV NP CTD.° However, the exact 
residues involved in the binding had not been 


identified. To identify these residues, we conducted 
chemical shift displacement (CSD) studies by titrat- 
ing 10-mer (dT19) or 20-mer (dT29) poly-deoxythy- 
mine (poly-dT) single-stranded DNA (ssDNA) into 
the protein samples. ssDNA were used throughout 
this study as surrogates of single-stranded RNA. 
Titration of dTjo or dT29 into the protein sample 
caused a concentration-dependent gradual shift of 
some resonances, instead of the appearance of a new 
set of resonances, suggesting that the binding occurs 
in the fast-exchange regime, which is indicative of a 
low-affinity nucleic-acid-binding protein.'* For 
dTio, significant chemical shift changes were loca- 
lized primarily in the N-terminal region, particularly 
K250, E253, A254, S256, K257, and K258 (Fig. 6a and 
b), while the majority of the other resonances were 
scarcely affected. This result suggests that dTio 
binds to the SARS-CoV NP at the N-terminal flexible 
segment, without affecting the overall structure of 
the protein (Fig. 4). The binding constant estimated 
from the CSD studies at various dT;, concentrations 
is Kqg~30 pM. 

Similarly, dT 29 also generated significant chemical 
shift changes in the same set of N-terminal residues; 
however, it also induced CSD of the resonances of 
R320, H335, and A337, which are located in the B- 
sheet of the CTD dimer (Fig. 6b and d). Since these 
residues are not affected when the SARS-CoV NP 
CTD is bound to dT 19 (Fig. 6a and c), we can rule out 
the effect of long-range structural alterations 
induced by the binding of oligonucleotides to the 
N-termini. Our results suggest that R320, H335, and 
A337 also contribute to nucleic acid binding and 
could be part of the binding site. This observation is 
unexpected, as these residues are sequentially and 
structurally distant from the N-terminal region. It 
should be noted that we could only add dT29 up 
to a [dT29]/[CTD monomer] ratio of 1:4. A higher 
ratio of DNA caused precipitation. As such, we 
could not obtain a reliable dissociation constant for 
the complex, and the chemical shift perturbation 
shown in Fig. 6b may not be the maximum change 
expected at a saturating dT29 concentration for the 
complex. 


Mutagenesis of the nucleic-acid-binding sites of 
SARS-CoV NP CTD 


To further quantify the relative contribution of 
positively charged residues to oligonucleotide bind- 
ing by the SARS-CoV NP CTD, we produced double 
mutants targeting K257/K258 and measured the 
effect of the mutations on the apparent dissociation 
constant (Kg) with fluorescently labeled dT 2 
through EMSA. Since the mutation sites are 
located in the inherently flexible regions, the muta- 
tions did not cause any structural perturbations - to 
other parts of the CTD dimer, as monitored by '°N 
HSQC spectra (Fig. 7). The EMSA results are 
summarized in Table 2. We found that the charge- 
preserving K257R/K258R mutant did not affect the 
apparent binding affinity, compared to the wild- 
type construct, whereas the K257Q/K258Q mutant 
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Fig. 4. NMR structure of the SARS-CoV NP CTD. (a) Superposition of the 20 lowest-energy NMR structures of the 
SARS-CoV NP CTD and the corresponding crystal structure spanning residues 248-365.° The two subunits in each 
structure are in orange and magenta for the NMR structure, and in red and green for the crystal structure. (b) Ribbon 
diagram of the solution structure of the SARS-CoV NP CTD. Secondary structure elements are labeled for one subunit. 
(c) Sequence of the SARS-CoV NP CTD. The secondary structures for the NMR structure and for the crystal structure 
spanning residues 248-365 are shown above the sequence, with red cylinders for a-helices and yellow arrows for 


6-strands. 


showed a 5-fold reduction in affinity, although 
binding was not completely abolished (Fig. 8, Table 
2). Our results suggest that the positive charges 
at positions 257 and 258 of the SARS-CoV NP CTD 
are significant determinants of its binding affinity 
towards oligonucleotides. 

Similarly, we also generated the R320A and 
H335A mutants. The “N HSQC spectra of these 
mutants revealed that, in both cases, the structural 


perturbations are mostly limited to the regions 
adjacent to the Arg320 and His335 mutation sites 
(Fig. 9). However, the R320A mutation had a 
chemical shift perturbation larger than that of the 
H335A mutation, probably because the Arg side 
chain made more contacts with adjacent residues. 
The changes in the respective apparent dissociation 
constants were measured by EMSA. Mutations at 
this secondary binding site lowered the apparent 
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Fig. 5. Superposition of the NMR and crystal structures of a CTD monomer of SARS-CoV NP. The mean NMR 
structure of a CTD monomer (blue) is superposed on the corresponding crystal structure spanning residues 248-365 (a) 


(red; PDB code 2cjr) and on that encompassing residues 270-370 (b) (light green; PDB code 2gib).’ 


In (a), the two 


structures are superposed on the regions of residues 260-319 and 333-358, where the backbone RMSD between them is 
1.45 A. In (b), the two structures are superposed on the regions of residues 274-319 and 333-358, where the backbone 


RMSD is 1.26 A. 


affinity towards dT79 by about twofold (Fig. 8, Table 
2). Although the effects were not as remarkable 
as those of the mutations near the N-terminus, the 
loss of binding affinity was still measurable. Our 
results are in agreement with the hypothesis that the 
B-sheet region of the SARS-CoV NP CTD is part of 
the nucleic-acid-binding site. 


Discussion 


SAIL as an emergent tool for solution structure 
determination 


The SAIL method is characterized by a sophisti- 
cated labeling pattern that is highly optimized for 
structure determination, in contrast to other preex- 
isting isotope labeling techniques. Conventional 
strategies utilizing selective protonation under a 
predeuterated background lead to a compromise 
between increased intensity of the labeled protons 
and loss of information about the deuterated ones. 
For instance, while the selective protonation of the 
methyl protons of Ile, Leu, and Val in a deuterated 
background is very effective for the observation of 
methyl protons, one cannot obtain any information 
on the remaining side chains.'? The SAIL method, 
on the other hand, combines the merits of increased 
intensity through the deuteration of redundant 
protons and preservation of the structural informa- 
tion by leaving key protons intact. Of particular 
interest is the use of SAIL aromatic residues, which 
was first demonstrated with calmodulin.'” Assign- 
ments of chemical shift and NOE peaks involving 
aromatic signals are indispensable for the high- 
quality structure determination of proteins, since 


aromatic residues are often part of the folding core. 
In the case of the SARS-CoV NP CTD, the aromatic 
resonances of the UL protein were severely over- 
lapped, making their assignment difficult, if not 
impossible (Fig. 3b and e). The introduction of 
aromatic SAIL amino acids in the sample resolved 
this problem (Fig. 3d and f), ultimately leading to the 
elucidation of the SARS-CoV NP CTD solution 
structure. This study is the first case to have 
demonstrated that the use of the SAIL phenylala- 
nine and tyrosine residues was effective in the NMR 
spectral analysis of a large protein. This is also the 
first instance of a homodimeric protein structure to 
have been solved by the SAIL approach. The SAIL 
method becomes more effective with increasing 
molecular weight and allows for the structures of 
larger proteins with even more intricate features to 
be solved.® 


Differences between the solution structure and 
the crystal structures of SARS-CoV NP CTD 


There are two distinct differences between the 
structure of the SARS-CoV NP CTD in solution and 
the structure of the SARS-CoV NP CTD in the 
crystal. The first is the orientation of the short turn in 
the B-sheet: in the crystal, the short turn is closer to 
the N-terminal residues of the same protomer than 
in the solution structure, resulting in a more 
compact crystal structure (Fig. 5). This is observed 
in all of the crystal structures of the SARS-CoV NP 
CTD solved to date, Tegardless of the space group 
and the construct.”” It is possible that crystal 
packing is responsible for the compactness. On the 
other hand, the less compact solution structure of 
the short hairpin turn could be the result of the 
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Fig. 6. CSD of SARS-CoV NP CTD titrated with poly-dT ssDNA. Variation of the CSD of SARS-CoV NP CTD titrated with dT19 (a) or dT29 (b). The dashed lines in (a) and (b) 
represent the cutoff for significant displacements. (c) Spatial locations of residues (red) with CSD values larger than the cutoff value upon titration with dT1o (c) or dT 29 (d). The two 
monomers are in green and blue, respectively. 
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Fig. 7. Structure perturbation of K257/K258 double mutants. (a) Overlay of '"N-edited HSQC spectra from wild-type 
CTD (blue), K257R/K258R (green), and K257Q/K258Q (red) double mutants. Affected resonances are identified by their 
respective residue types and numbers in the wild-type protein. These are mapped onto the ribbon structure of the CTD 


dimer in (b). 


more dynamic character of the CTD in an aqueous 
environment. 

The second difference lies in the conformation of 
the N-termini. The N-termini in the crystal structure 
of the SARS-CoV NP CTD, spanning residues 248- 
365, form an ordered conformation anchored by 


intramolecular and intermolecular interactions 
between various adjacent residues.” However, the 
N-termini in solution are disordered and, in agree- 
ment with our previous studies,° lack a short helix 
formed by residues 259-263 in the crystal structure 
(Fig. 4). The CTD is arranged as an octamer within 
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Table 2. Binding coefficients for dT2) to SARS-CoV NP 
CTD 


Protein Apparent Kq (11M) Hill coefficient 
Wild type 17.19+41.51 0.82+0.05 
K257R/K258R 13.96+0.92 0.92+0.05 
K257Q/K258Q 94.42+5.56 0.83+0.04 
R320A 33.81+2.16 0.85+0.04 
H335A 38.07+2.25 1.01+0.05 


the unit cell of the crystal, and this short helix could 
be the result of different solvent conditions, crystal 
packing, and/or the oligomerization process.'° 
Residues within and adjacent to the short helix par- 
ticipate in intramolecular and intermolecular inter- 
actions within the crystal, and contribute to the 
formation of the octamer. It is possible that the 
short helix is selectively stabilized by the formation 
of new protein-protein contacts in the crystal 
octamer, similar to those observed for the binding 
of intrinsically disordered proteins to their targets. !7 
Transient formation of the short helix is also ob- 
served in a few of the conformers within the NMR 
structural ensemble. 


Relevance to ribonucleoprotein packaging 


In the crystal structure of SARS-CoV NP 43-365, we 
have previously found that the CTD forms an 


c123 4567 8 91011 
wt 


K257/258R 


K257/258Q 
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octamer.” The packaging of the octamers in the 
asymmetric unit of the crystal results in two parallel 
basic helical grooves. Residues 248-280 form a 
positively charged patch similar to that in the 
infectious bronchitis virus NP.'® These patches 
form a large part of the basic helical groove observed 
in the crystal structure. We postulate that the basic 
helical groove may serve as the RNA attachment 
site, and the structure suggests a mechanism for 
helical RNA packaging in the virus. However, the 
octamer has not been observed in solution. In our 
DNA titration study, we found that the spin-spin 
relaxation time, T2, of the amide resonances 
decreases, upon the addition of the DNA oligomer, 
at a rate faster than that expected for sheer increases 
in molecular weight. We suspect that the DNA 
complex forms transient higher-order multimers in 
solution. 

Within the helical oligomer model, one expects 
nucleic acid binding to stabilize the oligomer struc- 
ture. To investigate the consistency of our NMR 
chemical shift perturbation data with the proposed 
helical model, we mapped the spatial locations of 
the residues perturbed by dT29 binding onto ghe 
helical model proposed by Chen et al. (Fig. 10).° 
clearly shows that both the N-terminal residues a 
the additional perturbed residues in the f-sheet 
region, namely, R320 and H335, form the interior 
lining of the positively charged groove. The two 
N-termini of the dimer reside on the outside edge 
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Fig. 8. EMSA of SARS-CoV NP CTD mutants. (a) Mobility shift of dT2>)5 bound to wild-type (wt), K257R/K258R 
(K257/258R), K257Q/K258Q (K257/258Q), R320A, and H335A mutant proteins. The protein concentration was increased 
by a factor of 2, starting from lane 1 (439 nM) to lane 11 (0.45 mM). Lane C, negative control. (b) Binding curve of the 
K257/K258 double mutant towards dT29, compared to that of the wild-type protein. (c) Binding curve of the R320A and 
H335A mutants towards dT29, compared to that of the wild-type protein. Each curve in (b) and (c) represents the best fit 
from three independent assays. Results are summarized in Table 2. 
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and in the innermost part of the groove, respectively. 
The arrangement of the helical structure is such that 
the interior N-termini are still solvent-accessible. 
On the other hand, the perturbed ®-sheet residues 
reside in the midregion of the groove. Thus, while 
the current NMR titration results do not prove the 
validity of the helical model, they can still be satis- 
factorily accommodated within this framework. 
More rigorous data, such as the determination of 
the structure of the CTD/nucleic acid complex, are 
necessary to provide definitive proof of the helical 
model. 


Oligonucleotide binding and structural disorder 


We have previously shown that the SARS-CoV NP 
is a modular protein comprising two independent 
structural domains connected by a 66-residue linker 
and flanked on each end by the long, disordered N- 
and C-termini, which comprise 44 and 57 residues, 
respectively’ (Fig. 1a). The NTD, comprising resi- 
dues 45-181, has been shown to bind to RNA.* Here 
we showed that the CTD also binds to DNA with 
similar affinity to that of the NTD. We have shown 
previously that the di-domain fragment NP45-365 
binds to DNA and RNA with higher affinity than 
that of the respective NTD or CTD.!° Taken together, 
all of these observations suggest that the SARS-CoV 
NP binds to RNA at multiple sites, and the binding 
strength is enhanced by the multivalency effect, as 
multiple binding sites are in contact with the RNA 
molecule.'” This charge-based nonspecific binding 
mode works in conjunction with intrinsic disorder 
to confer two main advantages to the nonspecific 
binding of oligonucleotides to the CTD. First, 
because the disordered region is not locked into a 
single conformation, binding to a variety of partners 
can occur regardless of the structural features of 
the partner, as long as the electrostatic interaction 
provides enough free energy to maintain the bound 
state. This property allows the CTD to bind to oligo- 
nucleotides with different sequences or tertiary 
structures. This is an important feature of RNA 
chaperones, of which the SARS-CoV NP is a 
member, and hints at the possibility that residues 
248-270 are involved in the process.*°! Second, the 
unstructured protein molecule can have a greater 
capture radius for a specific binding site than that of 
the folded state with its restricted conformational 
freedom, the so-called “fly-casting mechanism.””* In 
this binding scenario, the unfolded state binds 
weakly at a relatively large distance, and then folds 
as the protein approaches the binding site. These two 
advantages could act together to ensure that the CTD 
is able to bind to a variety of nucleotide sequences 
with enough affinity to carry out its function, 
namely, the encapsulation of the viral genome. We 
envision the extended conformation of the NP 
molecule as a whole to facilitate its initial contact 
with the RNA molecule in a fly-casting mechanism. 
Subsequent rearrangement of the NP molecule in the 
RNA framework then results in favorable packing of 
the complex in a helical form. 


Materials and Methods 


Site-directed mutagenesis 


The SARS-CoV NP CTD was cloned from SARS-CoV 
TW1 strain sequencing vectors (a gift from Dr. P.-J. Chen, 
National Taiwan University Hospital) as previously 
described.® Mutants of the SARS-CoV NP CTD were 
produced with a QuickChange II kit (Stratagene, La Jolla, 
CA) on a RoboCycler 96 (Stratagene), in accordance with 
the manufacturer’s recommendations. Primers used for 
mutagenesis were purchased from Mission Biotech 
(Taiwan). Mutations were confirmed through DNA 
sequencing. 


Sample preparation 


The SARS-CoV NP CTD, encompassing residues 248— 
365 including an extra MHHHHHHAMG sequence at 
the N-terminus, was expressed in the E. coli BL21 (DE3) 
strain for nonlabeled and uniformly labeled samples, as 
described previously,° and in a cell-free reaction for the 
SAIL samples. The production of nonlabeled and uni- 
formly labeled samples by in vivo expression was 
performed in a conventional manner. The proteins 
expressed in E. coli were purified in accordance with our 
previously described protocol.° The cell-free expression of 
the SARS-CoV NP CTD was performed as described 
previously.’ The $30 extract containing minimal residual 
amino acids was used for the cell-free expression. In the 
cell-free synthesis of the SARS-CoV NP CTD, the concen- 
tration of each SAIL amino acid was set to 0.5 mM, and 
2.3 mg of the SAIL-SARS-CoV NP CTD was obtained from 
a total of 70 mg of SAIL amino mixture. SAIL amino acids 
were obtained from SAIL Technologies, Inc.{ The SAIL- 
SARS-CoV NP CTD thus produced was mainly in soluble 
form. The SAIL protein was purified by Ni-NTA affinity 
chromatography in 50 mM sodium phosphate (pH 7.4) and 
150 mM NaCl, followed by gel filtration in a buffer 
containing 50 mM sodium phosphate (pH 7.4), 150 mM 
NaCl, and 1 mM ethylenediaminetetraacetic acid (EDTA). 
The eluted SAIL-SARS-CoV NP CTD was then concen- 
trated and exchanged with the NMR buffer. 


NMR spectroscopy 


The SAIL-SARS-CoV NP CTD sample contained 
0.5 mM (10% DO buffer) and 0.5 mM (100% DO buffer) 
of the SAIL SARS-CoV NP CTD in NMR buffer [10 mM 
sodium phosphate pH 6.0, 50 mM NaCl, 1 mM EDTA, 
1 mM 2,2-dimethyl-2-silapentane-5-sulfonate, 0.01% 
NaNs3, 10% D2O, and Complete Mini protease inhibitor 
mix (Roche)]. SAIL-adapted NMR experiments for the 
structure determination were performed at 30 °C with 
Bruker 600-MHz or 800-MHz spectrometers equipped 
with a TXI triple resonance room-temperature probe or a 
cryoprobe. 'H-'°N HSQC spectra were obtained with a 1- 
mM 'N-labeled sample in NMR buffer on a Bruker 
Avance 500-MHz spectrometer equipped with a TXI 
cryoprobe, using an in-house adaptation of the pulse 
sequence. For mutant characterization and protein— 
ssDNA-binding studies in NMR buffer, ‘°N-labeled 
samples were prepared in NMR buffer, and spectra were 
recorded on Bruker Avance 600-MHz or 800-MHz 
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Fig. 10. Spatial locations of the nucleic-acid-binding sites in the helical packing model of the SARS-CoV NP CTD 
crystal. (a) Binding sites are shown in CPK models, with the N-termini residues in magenta and with the residues on the 
B-sheet (R320, H335, and A337) in green. The rest of the molecules are shown in a gray ribbon representation, except for 
the B-sheets, which are in cyan. (b) Surface charge representation of the proposed helical supramolecular complex 
(adapted from Chen et al.”). The yellow and orange lines represent viral RNA strands. Notice that the binding sites in (a) 
are located in the positively charged grooves within the supramolecular complex. 


spectrometers equipped with OXI quadruple resonance or 
TXI probes. The acquired data were processed with the 
XwinNMR suite (Bruker Biospin, Germany) or with iNMR 
(Nucleomatica, Italy), and chemical shift assignments 
were performed. The chemical shifts were referenced to 
2,2-dimethyl-2-silapentane-5-sulfonate and deposited in 
BioMagResBank. 


Structure calculation and refinement 


The NMR structure calculation of the SARS-CoV NP 
CTD was started using 28 independent pairs of inter- 
subunit distance restraints obtained from isotope-filtered 
NOE spectroscopy experiments reported previously.° 
Automated NOE cross-peak assignments’ and structure 
calculations with torsion-angle dynamics'® were per- 
formed using a modified version of the program 
CYANA 2.1, which incorporates SAIL labeling patterns,’ 
takes the homodimer symmetry explicitly into account for 
the network anchoring of NOE assignments,'” ensures an 


identical conformation of the two monomers by imposing 
torsion-angle difference restraints on all corresponding 
torsion angles, and maintains a symmetric relative 
orientation of the two monomers by applying distance 
difference restraints between symmetry-related intermo- 
lecular C°—C® distances. Backbone torsion-angle restraints 
obtained from database searches with the program 
TALOS"” were incorporated into the structural calcula- 
tion. Hydrogen-bond restraints were not used. CYANA 
structure calculations were started from 100 randomized 
conformers, and simulated annealing with 20,000 torsion- 
angle dynamics time steps per conformer was performed. 
The 20 conformers with the lowest final CYANA target 
function values were subjected to restrained energy refine- 
ment in explicit solvent against the AMBER force field.'® 


CSD studies 


A series of 2D '°N-edited HSQC spectra of uniformly 
'N-labeled SARS-CoV NP CTD protein (0.5 mM) was 


Fig. 9. Structure perturbation of R320A and H335A mutants. (a) Overlay of '"N-edited HSQC spectra from the wild- 
type CTD (blue) and the R320A mutant (red). Affected resonances are identified by their respective residue types and 
numbers in the wild-type protein. (b) Same as in (a), but with the wild-type CTD (blue) and the H335A mutant (magenta). 
(c) Mapping of residues affected by the R320A mutation (red) in the solution structure of the SARS-CoV NP CTD dimer. 
The side chains of R320 are shown in a neon representation. (d) Same as in (c), but showing the residues affected by the 
H335A mutation (in magenta). The side chain of H335 is also shown. 
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recorded in NMR buffer by titrating in different amounts 
of poly-dT ssDNA (Purigo, Taiwan). The affected amide 
correlations experienced CSD upon the addition of 
ssDNA. The unaffected and shifted resonances in 
uncrowded regions were easily assigned, whereas the 
shifted resonances in crowded regions were assigned by 
stepwise titration of the protein with small amounts of 
ssDNA and by tracing of the changes in the CSD until the 
desired final concentration is achieved. The final protein/ 
ssDNA ratio was 1:1 for 10-mer and 4:1 for 20-mer. 
Protein/ssDNA ratios higher than the one presented here 
resulted in the formation of a precipitate within the 
sample. The weighted CSD for each residue was calculated 
with the formula: CSD=(1/2((AS'yn)* + (Ad? N/5)?)!, 
where Ad denotes the chemical shift difference between 
the final complex and the free protein resonances. The 
experimental error in the weighted CSD from the spectral 
resolution was calculated as: (1/2((SW'HN/ points in 
1H)? +((SW?x/points in '°N)/5)))'/?, where SW denotes 
the total spectral width of the dimension. Amides with 
CSD values larger than the average shift of all the peaks 
plus the experimental error were selected as affected. 


EMSA 


All experiments were conducted in NMR buffer with 6- 
aminohexylfluorescein-labeled ssDNA (Purigo). Reac- 
tions were set up in 20-yl aliquots each containing 
50 nM 6-aminohexylfluorescein-labeled ssDNA. Protein 
was added to the aliquots starting at a concentration of 
500 4M, with each following aliquot containing a 2-fold 
serial dilution of the protein. A control reaction was set 
up where only ssDNA and buffer were added. The 
aliquots were allowed to react at room temperature for 
30 min, and then were loaded on a 0.5 Tris—borate 
EDTA buffer DNA retardation gel (Invitrogen, Carlsbad, 
CA). The gel was run at 30 V and 4 °C for 2.5 h, and the 
bands were visualized with a Typhoon 9410 variable 
mode imager (Amersham Biosciences, Piscataway, NJ). 
Quantitation of the free ssDNA band was achieved 
through the ImageJ software (National Institutes of 
Health, Bethesda, MA). Bound ssDNA was estimated 
by subtracting the free ssDNA band of each reaction from 
that of the control lane. The fraction of bound ssDNA was 
fitted against the equation: Y=1/(1+(Kg/X)"), using 
GraphPad Prism (GraphPad Software, San Diego, CA), 
where Y is the fraction of ssDNA bound to the protein, X 
is the protein concentration, Ka is the dissociation 
constant, and n is the Hill coefficient. All experiments 
were repeated twice. 


PDB accession codes 


Chemical shift assignments and atomic coordinates 
have been deposited in BioMagResBank (accession code 
15511) and PDB (accession code 2jw8), respectively. 
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